12 research outputs found
Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep Learning Approaches
Radar sensors can be used for analyzing the induced frequency shifts due to
micro-motions in both range and velocity dimensions identified as micro-Doppler
(-D) and micro-Range (-R), respectively.
Different moving targets will have unique -D and
-R signatures that can be used for target classification.
Such classification can be used in numerous fields, such as gait recognition,
safety and surveillance. In this paper, a 25 GHz FMCW Single-Input
Single-Output (SISO) radar is used in industrial safety for real-time
human-robot identification. Due to the real-time constraint, joint
Range-Doppler (R-D) maps are directly analyzed for our classification problem.
Furthermore, a comparison between the conventional classical learning
approaches with handcrafted extracted features, ensemble classifiers and deep
learning approaches is presented. For ensemble classifiers, restructured range
and velocity profiles are passed directly to ensemble trees, such as gradient
boosting and random forest without feature extraction. Finally, a Deep
Convolutional Neural Network (DCNN) is used and raw R-D images are directly fed
into the constructed network. DCNN shows a superior performance of 99\%
accuracy in identifying humans from robots on a single R-D map.Comment: 6 pages, accepted in IEEE Radar Conference 201
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Convolution-augmented transformers (Conformers) are recently proposed in
various speech-domain applications, such as automatic speech recognition (ASR)
and speech separation, as they can capture both local and global dependencies.
In this paper, we propose a conformer-based metric generative adversarial
network (CMGAN) for speech enhancement (SE) in the time-frequency (TF) domain.
The generator encodes the magnitude and complex spectrogram information using
two-stage conformer blocks to model both time and frequency dependencies. The
decoder then decouples the estimation into a magnitude mask decoder branch to
filter out unwanted distortions and a complex refinement branch to further
improve the magnitude estimation and implicitly enhance the phase information.
Additionally, we include a metric discriminator to alleviate metric mismatch by
optimizing the generator with respect to a corresponding evaluation score.
Objective and subjective evaluations illustrate that CMGAN is able to show
superior performance compared to state-of-the-art methods in three speech
enhancement tasks (denoising, dereverberation and super-resolution). For
instance, quantitative denoising analysis on Voice Bank+DEMAND dataset
indicates that CMGAN outperforms various previous models with a margin, i.e.,
PESQ of 3.41 and SSNR of 11.10 dB.Comment: 16 pages, 10 figures and 5 tables. arXiv admin note: text overlap
with arXiv:2203.1514
CMGAN: Conformer-based Metric GAN for Speech Enhancement
Recently, convolution-augmented transformer (Conformer) has achieved
promising performance in automatic speech recognition (ASR) and time-domain
speech enhancement (SE), as it can capture both local and global dependencies
in the speech signal. In this paper, we propose a conformer-based metric
generative adversarial network (CMGAN) for SE in the time-frequency (TF)
domain. In the generator, we utilize two-stage conformer blocks to aggregate
all magnitude and complex spectrogram information by modeling both time and
frequency dependencies. The estimation of magnitude and complex spectrogram is
decoupled in the decoder stage and then jointly incorporated to reconstruct the
enhanced speech. In addition, a metric discriminator is employed to further
improve the quality of the enhanced estimated speech by optimizing the
generator with respect to a corresponding evaluation score. Quantitative
analysis on Voice Bank+DEMAND dataset indicates the capability of CMGAN in
outperforming various previous models with a margin, i.e., PESQ of 3.41 and
SSNR of 11.10 dB.Comment: 5 pages, 1 figure, 2 tables, submitted to INTERSPEECH 202
Stairs Detection for Enhancing Wheelchair Capabilities Based on Radar Sensors
Powered wheelchair users encounter barriers to their mobility everyday.
Entering a building with non barrier-free areas can massively impact the user
mobility related activities. There are a few commercial devices and some
experimental that can climb stairs using for instance adaptive wheels with
joints or caterpillar drive. These systems rely on the use for sensing and
control. For safe automated obstacle crossing, a robust and environment
invariant detection of the surrounding is necessary. Radar may prove to be a
suitable sensor for its capability to handle harsh outdoor environmental
conditions. In this paper, we introduce a mirror based two dimensional
Frequency-Modulated Continuous-Wave (FMCW) radar scanner for stair detection. A
radar image based stair dimensioning approach is presented and tested under
laboratory and realistic conditions.Comment: 5 pages, Accepted and presented in 2017 IEEE 6th Global Conference on
Consumer Electronics (GCCE 2017
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks
Automatic speech recognition (ASR) systems are of vital importance nowadays
in commonplace tasks such as speech-to-text processing and language
translation. This created the need for an ASR system that can operate in
realistic crowded environments. Thus, speech enhancement is a valuable building
block in ASR systems and other applications such as hearing aids, smartphones
and teleconferencing systems. In this paper, a generative adversarial network
(GAN) based framework is investigated for the task of speech enhancement, more
specifically speech denoising of audio tracks. A new architecture based on
CasNet generator and an additional feature-based loss are incorporated to get
realistically denoised speech phonetics. Finally, the proposed framework is
shown to outperform other learning and traditional model-based speech
enhancement approaches.Comment: 5 pages, 4 figures and 2 Tables. Accepted in EUSIPCO 202
Person Identification and Body Mass Index: A Deep Learning-Based Study on Micro-Dopplers
Obtaining a smart surveillance requires a sensing system that can capture
accurate and detailed information for the human walking style. The radar
micro-Doppler (-D) analysis is proved to be a reliable metric
for studying human locomotions. Thus, -D signatures can be
used to identify humans based on their walking styles. Additionally, the
signatures contain information about the radar cross section (RCS) of the
moving subject. This paper investigates the effect of human body
characteristics on human identification based on their -D
signatures. In our proposed experimental setup, a treadmill is used to collect
-D signatures of 22 subjects with different genders and body
characteristics. Convolutional autoencoders (CAE) are then used to extract the
latent space representation from the -D signatures. It is
then interpreted in two dimensions using t-distributed stochastic neighbor
embedding (t-SNE). Our study shows that the body mass index (BMI) has a
correlation with the -D signature of the walking subject. A
50-layer deep residual network is then trained to identify the walking subject
based on the -D signature. We achieve an accuracy of 98% on
the test set with high signal-to-noise-ratio (SNR) and 84% in case of different
SNR levels.Comment: Accepted in IEEE Radarconf1
ipA-MedGAN: Inpainting of Arbitrary Regions in Medical Imaging
Local deformations in medical modalities are common phenomena due to a
multitude of factors such as metallic implants or limited field of views in
magnetic resonance imaging (MRI). Completion of the missing or distorted
regions is of special interest for automatic image analysis frameworks to
enhance post-processing tasks such as segmentation or classification. In this
work, we propose a new generative framework for medical image inpainting,
titled ipA-MedGAN. It bypasses the limitations of previous frameworks by
enabling inpainting of arbitrary shaped regions without a prior localization of
the regions of interest. Thorough qualitative and quantitative comparisons with
other inpainting and translational approaches have illustrated the superior
performance of the proposed framework for the task of brain MR inpainting.Comment: Submitted to IEEE ICIP 202
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for speech enhancement (SE) in the time-frequency (TF) domain. The generator encodes the magnitude and complex spectrogram information using two-stage conformer blocks to model both time and frequency dependencies. The decoder then decouples the estimation into a magnitude mask decoder branch to filter out unwanted distortions and a complex refinement branch to further improve the magnitude estimation and implicitly enhance the phase information. Additionally, we include a metric discriminator to alleviate metric mismatch by optimizing the generator with respect to a corresponding evaluation score. Objective and subjective evaluations illustrate that CMGAN is able to show superior performance compared to state-of-the-art methods in three speech enhancement tasks (denoising, dereverberation and super-resolution). For instance, quantitative denoising analysis on Voice Bank+DEMAND dataset indicates that CMGAN outperforms various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB. </p